Quality Control

Dataset Quality control was executed with the DADA2 in-built QC commands. The mini-pipeline involves left trimming the 15bp ends (suggested for IonTorrent data), discarding reads shorter than 50bp, denoising and finally merging any chimeric reads to increase eficiency. FastQ Screen was used to generate mappings of QC filtered fastQ reads to reference contaminant sequences (hg19, mm10).

Data filtering

The table shows how many reads were truncated in the fitering steps followed in the DADA-2 pipeline for taxonomy classification. The ‘Chimeras Filtered’ column shows the number of fully filtered reads that are used for the downstream analysis.
DADA2 reads filtering
QC Filtered Contaminant reads
FastQ reads
DADA2 reads
Human
Mouse
Total Input QC Filtered Denoised Chimeras Filtered #Bacteria %Bacteria #Human %Human #Mouse %Mouse
Gut_C3_1_merged 998735 828496 814390 722564 602787 83.4% 7675 0.9% 130728 15.8%
Gut_C3_4_merged 4108448 3150534 2950625 2271789 1695998 74.7% 945042 30.0% 95385 3.0%
Liver_C3_1_merged 449654 403739 388635 335347 304382 90.8% 7232 1.8% 39075 9.7%
Liver_C3_2_merged 2199911 2001177 1958592 1845465 707620 38.3% 62379 3.1% 1189659 59.4%
Lung_C3_3_merged 1317409 1042628 1035637 943925 941161 99.7% 3667 0.4% 1888 0.2%
Lung_F3_1_merged 1838274 1606855 1593833 1379285 1294395 93.8% 11365 0.7% 88189 5.5%

Contaminant Reads mapping

The graphs below depict mapping profiles of the analysed samples. Reference sequences used are hg19, mm10, hg19_16srRna, mm10_16srRna and E.Coli_16srRna.
Reads mapping distributionsReads mapping distributionsReads mapping distributionsReads mapping distributionsReads mapping distributionsReads mapping distributions

Reads mapping distributions

Quality Control plots

In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The red line shows the scaled proportion of reads that extend to at least that position. The error rates for each possible transition (A→C, A→G, …) are shown. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score.

Taxonomic Analysis

Taxonomic Analysis was run according to the standard PhyloSeq Bioconductor package pipeline to generate the following five basic visualisations.

Alpha diversity

Alpha diversity will visualize how many different species could be decected in a microbial ecosystem.

Beta diversity

Beta diversity will depict how different is the microbial composition in one environment compared to another based on the Order of each species. Samples/species are separated on two side-by-side panels.

OTU abundance analysis

Abundance of top 30 most abundant OTUs accross all samples. At each OTU family’s horizontal position, the abundance values for each OTU are stacked in order from greatest to least, separate by a thin horizontal line. The values are stacked in order as a means of displaying both the sum total value while still representing the individual OTU abundances.

Dendrogram

To capture the species diversity as much as possible, 200 random OTUs among the 1000 most abundant ones are shown in this figure. Any species-level annotation available will be displayed next to the relevant point. OTUs are distinguished in terms of abundance, sample, and phylum by size, shape and color of the points respectively.

Phyla structure network analysis

The network helps identify any underlying structures in the co-occurence of different phyla across all datasets. The graph represents the 200 most abundant OTUs.

Additional plots

Additional plots were generated to give a picture of the datasets enzyme distributions and involved pathways. PiCRUST2 was used to generate the functional annotations for the treemap and add KEGG_IDs for the pathway analysis.

Functional Annotation

This is the functional annotation of the OTUs. The treemap depicts the actual abundance of enzyme classes, groupped by sample.

Pathways analysis

The heatmap demonstrates the relative (on a 0-1 scale) abundance of the pathways that our OTUs were found to participate in.